Session F-3

Scheduling 1

Conference
10:00 AM — 11:30 AM EDT
Local
May 4 Wed, 9:00 AM — 10:30 AM CDT

AutoByte: Automatic Configuration for Optimal Communication Scheduling in DNN Training

Yiqing Ma (HKUST, China); Hao Wang (HKUST, Hong Kong); Yiming Zhang (NUDT & NiceX Lab, China); Kai Chen (Hong Kong University of Science and Technology, China)

1
ByteScheduler partitions and rearranges tensor transmissions to improve the communication efficiency of distributed Deep Neural Network (DNN) training. The configuration of hyper-parameters (i.e., the partition size and the credit size) is critical to the effectiveness of partitioning and rearrangement. Currently ByteScheduler adopts Bayesian Optimization (BO) to find the optimal configuration for the hyper-parameters beforehand. In practice, however, various runtime factors (such as worker node status and network conditions) change over time, making the statically-determined one-shot configuration result suboptimal for real-world DNN training.
To address this problem, in this paper we present a realtime configuration method (called AutoByte) that automatically and timely searches the optimal hyper-parameters as the training systems dynamically change. AutoByte extends the ByteScheduler framework with a meta-network, which takes the systems' runtime statistics as its input and outputs predictions for speedups under specific configurations. Evaluation results on various DNN models show that AutoByte can dynamically tune the hyper-parameters with low resource usage, and deliver up to 33.2% higher performance than the best static configuration method on the ByteScheduler framework.

Joint Near-Optimal Age-based Data Transmission and Energy Replenishment Scheduling at Wireless-Powered Network Edge

Quan Chen (Guangdong University of Technology, China); Zhipeng Cai (Georgia State University, USA); Cheng Liang lun and Feng Wang (Guangdong University of Technology, China); Hong Gao (University of Harbin Institute Technology, China)

0
Age of Information, emerged as a new metric to quantify the data freshness, has attracted increasing interests recently.
Most existing works try to optimize the system AoI from the point of data transmission. Unfortunately, at wireless-powered network edge, the charging schedule of the source nodes also needs to be decided besides data transmission. Thus, in this paper, we investigate the joint scheduling problem of data transmission and energy replenishment to optimize the peak AoI at network edge with directional chargers. To the best of our knowledge, this is the first work that considers such two problems simultaneously.
Firstly, the theoretical bounds of the peak AoI with respect to the charging latency are derived. Secondly, for the minimum peak AoI scheduling problem with a single charger, an optimal scheduling algorithm is proposed to minimize the charging latency, and then a data transmission scheduling strategy is also given to optimize the peak AoI. The proposed algorithm is proved to have a constant approximation ratio of up to 1.5. When there exist multiple chargers, an approximate algorithm is also proposed to minimize the charging latency and peak AoI. Finally, the simulation results verify the high performance of proposed algorithms in terms of AoI.

Kalmia: A Heterogeneous QoS-aware Scheduling Framework for DNN Tasks on Edge Servers

Ziyan Fu and Ju Ren (Tsinghua University, China); Deyu Zhang (Central South University, China); Yuezhi Zhou and Yaoxue Zhang (Tsinghua University, China)

0
Motivated by the popularity of edge intelligence, DNN services have been widely deployed at the edge, posing significant performance pressure on edge servers. How to improve the QoS of edge DNN services becomes a crucial and challenging problem. Previous works, however, did not fully consider the heterogeneous QoS requirements on urgent and non-urgent tasks, causing frequent QoS violations. Meanwhile, our empirical study shows that severe task interference exists in concurrent DNN tasks, further degrading the timeliness of urgent tasks and throughput of non-urgent tasks. To address these issues, we propose Kalmia, a heterogeneous QoS-aware framework for DNN inference task scheduling on edge servers. Specifically, Kalmia includes an offline profiling stage and an online scheduling policy. In offline profiling, we build a regression model to predict the execution time of tasks. During online scheduling, we classify the tasks into urgent and non-urgent tasks and distribute them into two CUDA contexts. By a tailored scheduling strategy, non-urgent tasks can fully utilize the computing resources for throughput improvement, while the timeliness of urgent tasks can be guaranteed via preemption. Experimental results demonstrate that Kalmia can achieve up to 2.8× improvement in throughput and significantly reduce the deadline violation rate compared with state-of-the-art methods.

Subset Selection for Hybrid Task Scheduling with General Cost Constraints

Yu Sun, Chi Lin, Jiankang Ren, Pengfei Wang, Lei Wang, Guowei WU and Qiang Zhang (Dalian University of Technology, China)

0
Subset selection problem for task scheduling with general cost constraints exists widely in IoT applications. Its objective is to select several profitable tasks to execute under routing and cost constraints such that the total profit is maximized. Most prior arts only focus on either online tasks or offline tasks, which are usually inapplicable in practical applications where online tasks and offline tasks co-exist. In this paper, we study the subset selection problem for HybrId Task Scheduling with general cost constraints (HITS), in which both online and offline tasks are scheduled to maximize the overall profit. We first divide the HITS problem into online and offline subproblems and propose two algorithms to solve them with bounded approximation ratios. Furthermore, we propose an approximation algorithm for the hybrid scenario where both online and offline tasks are considered. Extensive simulations show that our proposed algorithm outperforms baseline algorithms by 21.5% averagely in profit and also performs well in pure online/offline scenarios. We further demonstrate the feasibility of our algorithm through test-bed experiments in a realistic scene.

Session Chair

Yusheng Ji (National Institute of Informatics)

Session F-4

Scheduling 2

Conference
12:00 PM — 1:30 PM EDT
Local
May 4 Wed, 11:00 AM — 12:30 PM CDT

EdgeTuner: Fast Scheduling Algorithm Tuning for Dynamic Edge-Cloud Workloads and Resources

Rui Han, Shilin Wen, Chi Harold Liu, Ye Yuan and Guoren Wang (Beijing Institute of Technology, China); Lydia Y. Chen (IBM Zurich Research Laboratory, Switzerland)

0
Edge-cloud jobs are rapidly prevailing in many application domains, posing the challenge of using both resource-strenuous edge devices and elastic cloud resources. Efficient resource allocation on such jobs via scheduling algorithms is essential to guarantee their performance, e.g. latency. Deep reinforcement learning (DRL) is increasingly adopted to make scheduling decisions but faces the conundrum of achieving high rewards at a low training overhead. It is unknown if such a DRL can be applied to timely tune the scheduling algorithms that are adopted in response to fast changing workloads and resources. In this paper, we propose EdgeTuner to effectively leverage DRL to select scheduling algorithms online for edge-cloud jobs. The enabling features of EdgeTuner are sophisticated DRL model that captures complex dynamics of Edge-Cloud jobs/tasks and an effective simulator to emulate the response times of short-running jobs in accordance to dynamically changing scheduling algorithms. EdgeTuner trains DRL agents offline by directly interacting with the simulator. We implement EdgeTuner on Kubernetes scheduler and extensively evaluate it on Kubernetes cluster testbed driven by the production traces. Our results show that EdgeTuner outperforms prevailing scheduling algorithms by achieving significant lower job response time while accelerating DRL training speed by more than 180x.

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

Ziyue Luo, Yixin Bao and Chuan Wu (The University of Hong Kong, Hong Kong)

0
Training Graph Neural Networks (GNN) on large graphs is resource-intensive and time-consuming, mainly due to the large graph data that cannot be fit into the memory of a single machine, but have to be fetched from distributed graph storage and processed on the go. Unlike distributed deep neural network (DNN) training, the bottleneck in distributed GNN training lies largely in large graph data transmission for constructing mini-batches of training samples. Existing solutions often advocate data-computation colocation, and do not work well with limited resources where the colocation is infeasible. The potentials of strategical task placement and optimal scheduling of data transmission and task execution have not been well explored. This paper designs an efficient algorithm framework for task placement and execution scheduling of distributed GNN training, to better resource utilization, improve execution pipelining, and expediting training completion. Our framework consists of two modules: (i) an online scheduling algorithm that schedules the execution of training tasks, and the data transmission plan; and (ii) an exploratory task placement scheme that decides the placement of each training task. We conduct thorough theoretical analysis, testbed experiments and simulation studies, and observe up to 67% training speed-up with our algorithm as compared to representative baselines.

Payment Channel Networks: Single-Hop Scheduling for Throughput Maximization

Nikolaos Papadis and Leandros Tassiulas (Yale University, USA)

0
Payment channel networks (PCNs) have emerged as a scalability solution for blockchains built on the concept of a payment channel: a setting that allows two parties to safely transact between themselves in high frequencies by updating pre-committed balances. Transaction requests in PCNs may be declined because of unavailability of funds due to temporary uneven distribution of the channel balances. In this paper, we investigate how to alleviate unnecessary payment blockage via proper prioritization of the transaction execution order. Specifically, we consider the scheduling problem in a payment channel: as transactions continuously arrive on both sides, nodes need to decide which ones to process and when, in order to maximize channel throughput. We introduce a stochastic model to capture the dynamics of a payment channel under discrete stochastic arrivals, with incoming transactions potentially held in buffers up until some deadline in order to enable more elaborate processing decisions. We describe a scheduling policy that maximizes the channel success rate/throughput, formally prove its optimality for fixed-amount transactions, and also show its superiority in the case of heterogeneous amounts via experiments in our discrete event simulator. Overall, our work is a step in the direction of formal research on improving PCN performance.

Shield: Safety Ensured High-efficient Scheduling for Magnetic MIMO Wireless Power Transfer System

Wangqiu Zhou, Hao Zhou, Xiaoyu Wang, Kaiwen Guo, Haisheng Tan and Xiang-Yang Li (University of Science and Technology of China, China)

0
Recently, the developed techniques such as magnetic resonant coupling (MRC) and multiple-input multiple-output (MIMO) transmission have significantly improved the charging efficiency and distance for wireless power transfer (WPT) systems. However, the electromagnetic radiation (EMR) safety of wireless charging is critical in practice while mostly ignored. In this work, we take the EMR safety into account in MIMO MRC-WPT systems. We propose a safety ensured high-efficient scheduling algorithm for magnetic MIMO wireless power transfer system (called Shield). Technically, we firstly devise a simple but accurate Z-axis rotational symmetrical EMR model along with a magnetic-field-line-based meshing scheme. Further, we express the EMR safety requirement in the continuous physical space with a limited number of constraints via random sampling and rule-based filtering. Finally, we build up a system prototype for Shield and conduct extensive experiments. With the given power budget and resonant frequency, the results reveal that the EMR safety requirement only influences the charging performance of an MRC-WPT system within a certain range. Furthermore, our algorithm Shield can dramatically improve the payload power transfer efficiency (PTE) by up to 66.60% compared with state-of-the-art baselines while guaranteeing the EMR safety.

Session Chair

Peshal Nayak (Samsung Research America)

Session F-5

Caching

Conference
2:30 PM — 4:00 PM EDT
Local
May 4 Wed, 1:30 PM — 3:00 PM CDT

Caching-based Multicast Message Authentication in Time-critical Industrial Control Systems

Utku Tefek (Advanced Digital Sciences Center, Singapore & University of Illinois Urbana-Champaign, USA); Ertem Esiner (Advanced Digital Sciences Center, Singapore); Daisuke Mashima (Advanced Digital Sciences Center & National University of Singapore, Singapore); Binbin Chen (Singapore University of Technology and Design, Singapore); Yih-Chun Hu (University of Illinois at Urbana-Champaign, USA)

0
Attacks against industrial control systems (ICSs) often exploit the insufficiency of authentication mechanisms. Verifying whether the received messages are intact and issued by legitimate sources can prevent malicious data/command injection by illegitimate or compromised devices. However, the key challenge is to introduce message authentication for various ICS communication models, including multicast or broadcast, with a messaging rate that can be as high as thousands of messages per second, within very stringent latency constraints. For example, certain commands for protection in smart grids must be delivered within 2 milliseconds, ruling out public-key cryptography. This paper proposes two lightweight message authentication schemes, named CMA and its multicast variant CMMA, that perform precomputation and caching to authenticate future messages. With minimal precomputation and communication overhead, C(M)MA eliminates all cryptographic operations for the source after the message is given, and all expensive cryptographic operations for the destinations after the message is received. C(M)MA considers the urgency profile (or likelihood) of a set of future messages for even faster verification of the most time-critical (or likely) messages. We demonstrate the feasibility of C(M)MA in an ICS setting based on a substation automation system in smart grids.

Distributed Cooperative Caching in Unreliable Edge Environments

Yu Liu, Yingling Mao, Xiaojun Shang, Zhenhua Liu and Yuanyuan Yang (Stony Brook University, USA)

0
Caching popular contents at the network edge is promising to reduce the retrieval latency, the network congestion, and the number of requests to the remote content provider during peak hours. In general, edge caching resource is costly and highly limited. Nevertheless, it is possible to provide cost-effective caching services using unreliable resources, which are resources reserved for other applications but have not been fully used or resources on vulnerable servers. In this paper, we consider the problem of caching popular contents over unreliable resources as a less expensive solution to limited edge caching capacity. In particular, to address the unreliability of edge resources, erasure coding is leveraged to increase the availability of cached contents. We formulate the problem as a discrete optimization problem and prove it is NP-hard. We start with two special cases of the problem and provide optimal algorithms for them. We then design an algorithm for the general version of the proposed problem and provide a provable performance guarantee. Extensive real-world data-driven simulations demonstrate that the proposed algorithms significantly outperform popular baselines, and the algorithm for the general version of the problem is near-optimal.

Online File Caching in Latency-Sensitive Systems with Delayed Hits and Bypassing

Chi Zhang, Haisheng Tan and Guopeng Li (University of Science and Technology of China, China); Zhenhua Han (Microsoft Research Asia, China); Shaofeng H.-C. Jiang (Peking University, China); Xiang-Yang Li (University of Science and Technology of China, China)

0
In latency-sensitive file caching systems such as Content Delivery Networks (CDNs) and Mobile Edge Computing (MEC), the latency of fetching a missing file to the local cache can be significant. Recent studies have revealed that successive requests of the same missing file before the fetching completes could still suffer latency (so-called delayed hits).

Motivated by the practical scenarios, we study the online general file caching problem with delayed hits and bypassing, i.e., a request may be bypassed and processed directly at the remote data center. The objective is to minimize the total request latency. We show a general reduction that turns a traditional file caching algorithm to one that can handle delayed hits. We give an ..O(Z^{3/2} \log K)..-competitive algorithm called CaLa with this reduction, where ..Z.. is the maximum fetching latency of any file and ..K.. is the cache size, and we show a nearly-tight lower bound ..\Omega(Z \log k).. for our ratio. Extensive simulations based on the production data trace from Google and the Yahoo benchmark illustrate that CaLa can reduce the latency by up to ..9.42\%.. compared with the state-of-the-art scheme dealing with delayed hits without bypassing, and this improvement increases to ..32.01\%.. if bypassing is allowed.

Retention-aware Container Caching for Serverless Edge Computing

Li Pan (Huazhong University of Science and Technology, China); Lin Wang (VU Amsterdam & TU Darmstadt, The Netherlands); Shutong Chen and Fangming Liu (Huazhong University of Science and Technology, China)

0
Serverless edge computing adopts an event-based model where Internet-of-Things (IoT) services are executed in lightweight containers only when requested, leading to significantly improved edge resource utilization. Unfortunately, the startup latency of containers degrades the responsiveness of IoT services dramatically. Container caching, while masking this latency, requires retaining resources thus compromising resource efficiency. In this paper, we study the retention-aware container caching problem in serverless edge computing. We leverage the distributed and heterogeneous nature of edge platforms and propose to optimize container caching jointly with request distribution. We reveal step by step that this joint optimization problem can be mapped to the classic ski-rental problem. We first present an online competitive algorithm for a special case where request distribution and container caching are based on a set of carefully designed probability distribution functions. Based on this algorithm, we propose an online algorithm called O-RDC with performance guarantees, for the general case, which incorporates the resource capacity and network latency by opportunistically distributing requests. We conduct extensive experiments to examine the performance of the proposed algorithms. Our results show that O-RDC outperforms existing caching strategies of current serverless computing platforms by up to 94.5% in terms of the overall system cost.

Session Chair

Jian Li (Binghamton University)

Session F-6

Low Latency

Conference
4:30 PM — 6:00 PM EDT
Local
May 4 Wed, 3:30 PM — 5:00 PM CDT

Dino: A Block Transmission Protocol with Low Bandwidth Consumption and Propagation Latency

Zhenxing Hu and Zhen Xiao (Peking University, China)

0
Block capacity plays a critical role in maintaining blockchain security and improving TPS. Increasing block capacity can help attain higher TPS while increasing block propagation latency and degrading system security. Existing work compressing the block size to shorten block propagation latency introduce an undesired side effect, which is that the size of compressed blocks will increase with transaction volume. Instead, we propose Dino, a new block transmission protocol. Once a node receives a Dino block, it can recover the original block with that Dino block and transactions in its transaction pool. Since Dino transmits block construction rules instead of compressed block content, it has good scalability to transmit blocks with larger transaction volume. We deploy Dino into Bitcoin and BCH to compare it with the Compact, XThin, and Graphene protocols. For a block with 3,000 transactions, its Dino block is no more than 1 KB in size, which is only 4% of a XThin block, 5% of a Compact block, and 20% of a Graphene block. The size of Dino blocks stays constant when the transaction volume reaches Bitcoin and BCH's protocol limit. Simulation experiments show Dino scales well with higher transaction generation rates and can reduce block propagation latency.

Enabling Low-latency-capable Satellite-Ground Topology for Emerging LEO Satellite Networks

Yaoying Zhang, Qian Wu, Zeqi Lai and Hewu Li (Tsinghua University, China)

0
The network topology design is critical for achieving low latency and high capacity in future integrated satellite and terrestrial networks (ISTN). However, existing studies mainly focus on the design of the inter-satellite topology of ISTN, and very little is known about the design of satellite-ground topology, as well as its impact on the attainable network performance.
In this paper, we conduct a quantitative study on the impact of various satellite-ground designs on the network performance of ISTN. We identify that the high-density and high-dynamicity characteristics of emerging mega-constellations have imposed big challenges, and caused significant routing instability, low network reachability, high latency and jitter over the ISTN path. To alleviate the above challenges, we formulate the Low-latency Satellite-Ground Interconnecting (LSGI) problem, targeting at integrating the space and ground segment in the ISTN, while minimizing the maximum transmission latency and keeping routing stable. We further design algorithms to solve the LSGI problem through wisely coordinating the establishment of ground-to-satellite links among distributed ground stations. Comprehensive experiment results demonstrate that our solution can outperform related schemes with about 19% reduction of the latency and 70% reduction of the jitter on average, while sustaining the highest network reachability among them.

SPACERTC: Unleashing the Low-latency Potential of Mega-constellations for Real-Time Communications

Zeqi Lai, Weisen Liu, Qian Wu and Hewu Li (Tsinghua University, China); Jingxi Xu (Tencent, China); Jianping Wu (Tsinghua University, China)

0
User-perceived latency is important for the quality of experience (QoE) of wide-area real-time communications (RTC). This paper explores a futuristic yet important problem facing the RTC community: can we exploit emerging mega-constellations to facilitate low-latency RTC globally? We carry out our quest in three steps. First, through a measurement study associated with a large number of geo-distributed RTC users, we quantitatively expose that the meandering routes over the client-cloud segment and the inter-cloud-site segment in existing cloud-based RTC architecture are critical culprits for the high latency issue suffered by wide-area RTC sessions. Second, after analyzing the low-latency potential enabled by mega-constellations, we formulate the dynamic RTC latency minimization problem under the integrated space-ground environment, and propose SPACERTC, a satellite-cloud cooperative framework that adaptively selects relay servers upon satellites and cloud sites to build an overlay network which enables diverse close-to-optimal paths, and then judiciously allocates RTC flows in the network to facilitate low-latency interactions. Finally, we implement a hardware-in-the-loop experimental environment based on public constellation information and RTC trace, and extensive experiments demonstrate that SPACERTC can deliver near-optimal interactive latency, with up to 64.9% latency reduction as compared with other state-of-the-art cloud-based solutions under representative videoconferencing traffic.

Torp: Full-Coverage and Low-Overhead Profiling of Host-Side Latency

Xiang Chen (Zhejiang University, Peking University, and Fuzhou University, China); Hongyan Liu (Zhejiang University, China); Junyi Guo (Peking University, China); Xinyue Jiang (Zhejiang University, China); Qun Huang (Peking University, China); Dong Zhang (Fuzhou University, China); Chunming Wu and Haifeng Zhou (Zhejiang University, China)

0
In data center networks (DCNs), host-side packet processing accounts for a large portion of the end-to-end latency of TCP flows. Thus, the profiling of host-side latency anomalies has been considered as a crucial part in DCN performance diagnosis and troubleshooting. In particular, such profiling requires full coverage (i.e., profiling every TCP packet handled by end-hosts) and low overhead (i.e., profiling should avoid high CPU consumption in end-hosts). However, existing solutions fully rely on end-hosts to implement host-side latency profiling, leading to limited coverage or high overhead. In this paper, we propose Torp, a framework that offers full-coverage and low-overhead profiling of host-side latency. Our key idea is to offload profiling operations to top-of-rack (ToR) switches, which inherently offer full coverage and line-rate packet processing performance. Specifically, Torp selectively offloads profiling operations to the ToR switch with respect to switch limitations. It efficiently coordinates the ToR switch and end-hosts to execute the entire latency profiling task. We have implemented Torp on a testbed comprising 32×100 Gbps Tofino switches. Testbed experiments indicate that Torp achieves full coverage and orders of magnitude lower host-side overhead compared to existing solutions.

Session Chair

Stenio Fernandes (Service Now)

Made with in Toronto · Privacy Policy · INFOCOM 2020 · INFOCOM 2021 · © 2022 Duetone Corp.